Re: [Ganglia-general] intermittent blanks in graphs

2006-01-24 Thread steven wagner
Rick Mohr wrote: On Mon, 23 Jan 2006, Ben Hartshorne wrote: snip When I go into the page for a single host and click on the 'gmetrics' link, I find that all of my metrics have a record of being recieved within the last two minutes (my time period). And yet, their graphs show up empty. Any

Re: [Ganglia-general] Gmetad and rrd

2005-11-23 Thread steven wagner
This, combined with your last message, makes it look like gmetad's not getting any data from gmond. That could be because gmond is not configured to accept connections from the gmetad host (yeah, even localhost!) or that there's some other major config wackiness going on. There doesn't

Re: [Ganglia-general] gmond - fully qualified host lookup

2005-08-16 Thread steven wagner
source code. Any other pointers will be helpful. Thanks, Utsav Agarwal On Tue, 16 Aug 2005 14:43:45 -0700, steven wagner [EMAIL PROTECTED] wrote: Change the order of the hosts entry to: IP address node1.domain.com node1 any-other

Re: [Ganglia-general] Re: Running as non-root on Solaris solved (I think)

2004-01-05 Thread steven wagner
Hmm, I must have been on vacation or something. Regardless, I don't have this code. And for the record, I never said I was happy about having to run gmond as root instead of nobody. :) Adeyemi Adesanya wrote: Hi Christopher. I have not received a response from Steven Wagner so I will sent

Re: [Ganglia-general] ganglia on tru64

2003-11-26 Thread steven wagner
Steve Feehan wrote: Although, I suppose it would be just as good to use lo0 as this is a one node cluster. So what is the recommendation for a one node cluster? Multicast is, well, is there a point? And what sort of route do I need? Yeah, feel free to use loopback for this. I am in the

Re: [Ganglia-general] ganglia on tru64

2003-11-25 Thread steven wagner
I'm sorry to report that you should be getting metric data back on Tru64. Sadly, I can't offer any developmental support here now because all our Alpha are belong to dumpster (although for the record, I am the one to blame for the monitoring core running on Tru64 to begin with... sorry about

Re: [Ganglia-general] cpustuff: Not enough space

2003-11-13 Thread steven wagner
dio wrote: Solaris 2.6 gmond v2.5.5 /usr/sbin/gmond daemon fails to start. /usr/sbin/gmond -d1 yields cpustuff: Not enough space not quite sure where the error is coming from. tnx, --dio The getkval() function is erroring out for some reason. Is the monitoring core running as root?

Re: [Ganglia-general] gmond start Error

2003-11-10 Thread steven wagner
Brooks Davis wrote: On Thu, Nov 06, 2003 at 10:02:07PM -0500, Krishna Kumar wrote: Hi, I installed ganglia on my solaris (with gmetad).. when I try to start the daemon, it gives me this error.. $ /gmond start /etc/rc.d/init.d/functions not found I've copied the gmond.init to /usr/sbin.. Is

Re: [Ganglia-general] Running as non-root on Solaris

2003-09-29 Thread steven wagner
Adeyemi Adesanya wrote: Hi There. I spent some time digging through the archives but I am unable to find a way of running gmond as a non-root user on Solaris. Is this out of the question or is there some way to patch the code? All of our critical servers run Solaris, that¹s where the real

Re: [Ganglia-general] New user gmond woes

2003-08-26 Thread steven wagner
I'm guessing that your new Ganglia cluster and your old Ganglia cluster are sending metrics out on the same multicast address. The fix is easy in one sense, but difficult in another (depending on the nature and quality of your cluster management tools): Change the multicast IP or port on one

Re: [Ganglia-general] Gmetad slowly collecting data

2003-08-06 Thread steven wagner
Marcia Prescott wrote: Thanks for the ideas. I ended up looking through the log file. It turns out that the machine I have the metadaemon had a time of 5 minutes faster. I suppose most people would have a network in sync. Anyway, because my metadaemon was ahead, it would create a rrd for a

Re: [Ganglia-general] Gmetad slowly collecting data

2003-08-01 Thread steven wagner
I have no answers, only vaguely-informed statements and half-formed questions (welcome to free software's version of tech support!). It is interesting to note that 4 hours = 16 real data points (at 15-minute polling intervals). That's a suspiciously round number... However, if this was just

Re: [Ganglia-general] Cluster

2003-07-31 Thread steven wagner
Dave Bradshaw wrote: Dear Steve, Thanks for the advice. I have done what you sugessted. I have also turned off one of the gmetad daemons so it is now only running on the machine with the web frontend. Now when I fire up the web frontend I get the following message: - Ganglia cannot find

Re: [Ganglia-general] Cluster

2003-07-30 Thread steven wagner
Dave Bradshaw wrote: Dear All, Am I an idiot? My rule of thumb is not to ask this sort of question on a list unless I'm asking something already covered in the docs. There's always a chance some wisecracker out there will answer it. :) Where am I going wrong? Different clusters

Re: [Ganglia-general] gmond dying

2003-07-08 Thread steven wagner
I have no specific solutions for you but here are some potentially helpful tidbits which may permit you to shoot your own trouble: Does the monitoring core die right away? Does it dump core? Does it die when you run it in debug mode? Does debug mode tell you anything more about the error? Do

Re: [Ganglia-general] Tru64 v5.1A version of gmond

2003-07-08 Thread steven wagner
Hector M. Jacas wrote: Hello to all! I am looking for the way to build and to install a version of GMOND for Tru64 v5.1A. Last year, when we had a meaningful number of Alphas on the premises running Tru64, I ported the monitoring core to that platform. It is an experience I don't recall

Re: [Ganglia-general] nodes running gmond reporting incorrectly

2003-07-02 Thread steven wagner
Kevin James Flasch wrote: * Check some of the gmond-only nodes' XML port output. How many nodes do they see? Do they see 289-295 nodes or just their own output? I believe you're referring to the mcast_port (by default 8649). When I telnet to it, I see what appears to be all/most of them.

Re: [Ganglia-general] not implemented yet

2003-06-30 Thread steven wagner
âÏÊËÏ îÉËÏÌÁÊ wrote: Hi all, I have added some metrics to ganglia, and tryed to change ganglia webfrontend code , so ,now he can show ALL metrics averaged by all nodes,in one page .Than you have no need to look on each node and test metrics . I have some success, but only to standart ganlia

Re: [Ganglia-general] Removing a machine

2003-06-30 Thread steven wagner
David Aikema wrote: Quoting steven wagner [EMAIL PROTECTED]: On a sufficiently new (= 2.5.0, I believe...) monitoring core, metrics and hosts should expire according to their DMAX attributes. Restarting the monitoring cores which will be polled by your metadaemon will clear the hosts out

Re: [Ganglia-general] Trouble launching gmond

2003-05-07 Thread steven wagner
Ken MacInnis wrote: On Wed, 7 May 2003, David Bickle wrote: Still having problems I've compiled gcc 3.2.2 from source with the CPU=sparc64. I'm running Solaris 8. I have also compiled ganglia with --enable-sparc64. gmond still won't launch for some reason. Check this: bash-2.03$ file

Re: [Ganglia-general] Trouble launching gmond

2003-05-07 Thread steven wagner
as root. Why is it complaining about /dev/ksyms not being 32-bit? Am I missing a configure option? Thanks Again, On Wed, 7 May 2003, steven wagner wrote: Ken MacInnis wrote: On Wed, 7 May 2003, David Bickle wrote: Still having problems I've compiled gcc 3.2.2 from source

Re: [Ganglia-general] Display problems etc...

2003-04-14 Thread Steven Wagner
Make sure you're using the latest gmetad and web front-end. Latest version is 2.5.3, and it incorporates fixes to directly address both issues (a was addressed in 2.5.2, b in 2.5.3). I've been having trouble with gaps for months - check the ganglia-general archives for various musings on

Re: [Ganglia-general] gmetad on OSX

2003-04-08 Thread Steven Wagner
M. Michael Barmada wrote: Hi, I'm wondering if anyone has had success compiling gmetad on OSX? Even after getting everything else working (installing rrd through fink required some additional arguments to configure to get all the libraries recognized), 'make' keeps failing in the gmetad

Re: [Ganglia-general] gmetad on OSX

2003-04-08 Thread Steven Wagner
/sw/include could be a Fink include install directory. Fink defaults to putting installed and built software in /sw, IIRC ... (I'm not running it at the moment on my Powerbook, which needs a 10.2 upgrade...) matt massie wrote: Today, M. Michael Barmada wrote forth saying... I'm wondering

Re: [Ganglia-general] nodes reporting on each other

2003-04-01 Thread Steven Wagner
Hi Arnie, Sounds like you need to change some multicast IPs. All the nodes that you want to appear in a single cluster should have the same multicast IP. Despite your best efforts to explain it, I think you're probably the best person to determine how you want your grid layout to look. :)

Re: [Ganglia-general] Display problem

2003-03-26 Thread Steven Wagner
matt massie wrote: prashant- so when a node in the cluster dies the cluster size changes but the dead node is not reported? this is a new problem that i haven't heard of before. did gmond get restarted after the node failed? ganglia knows the a node dies when it stops getting heartbeats

Re: [Ganglia-general] Cluster frontend not reporting

2003-03-11 Thread Steven Wagner
Leif Nixon wrote: Well, this is a new one - at least for me. One of our clusters was rebooted last week, due to a physical relocation. Now the ganglia XML data doesn't contain any mention of the cluster frontend, even though gmond is running fine and responding to the XML data port: nixon

Re: [Ganglia-general] Cluster frontend not reporting

2003-03-11 Thread Steven Wagner
Leif Nixon wrote: Steven Wagner [EMAIL PROTECTED] writes: That's how I found out that my front-end was *three* hops away from the test cluster and I'm thinking you have either a monitoring core config issue or a host/network config issue to track down... (maybe a host/network device between

Re: [Ganglia-general] Webfrontend graph's time resolution

2003-03-06 Thread Steven Wagner
Henry Leyh wrote: I cannot find anything unreasonable here. The polling interval seems to be correct. Note that do not have private 192.168... addresses for the cluster nodes. Yup, all that looks reasonable. My grab bag o' fixes is officially empty. :) One thing I guess you could try is

Re: [Ganglia-general] a few questions

2003-03-05 Thread Steven Wagner
Santanu Das wrote: Actually I did mean to say how to change the label like in spite of Unspecified Grid some thing like HEP DataDrid or else. Did somebody say, undocumented feature ? gmetad and the web front-end control the grid stuff - this is a new feature addition as of 2.5.2, which was

Re: [Ganglia-general] grid graphs missing parts

2003-02-21 Thread Steven Wagner
Nicholas Henke wrote: OK -- so check this link, it is all of our clusters: http://www.liniac.upenn.edu/ganglia. Notice how the overall graph is spotty, but none of the others are? How do I fix that ? Nic Hard to conclusively say without putting gmetad into debug mode and sifting through a

Re: [Ganglia-general] solaris not reporting running processes

2003-01-30 Thread Steven Wagner
That metric isn't currently supported on Solaris. I have an idea of how to do it but I simply haven't had the time to work on it. Basically it involves walking the /proc tree looking for processes in the Run state and multicasting that number. If someone else wants to write the code for it,

Re: [Ganglia-general] ganglia-webfrontend

2003-01-28 Thread Steven Wagner
John Francis Lee wrote: Thanks again! Setting the debug level to 10 showed me that gmetad was unable to connect to itself! I changed the datasource specification to 'localhost' from the machine'd fqdn and things worked! What I get now is 'There are 10 nodes up and running. There are no nodes

Re: [Ganglia-general] gmetric question

2003-01-28 Thread Steven Wagner
Joe Griffin wrote: Hi All, Is there any similar information on gmetric? I found a script I would like to use in number 16 of: http://ganglia.sourceforge.net/gmetric/ However, I cannot get gmetric to print any output. For example, I tried: /usr/bin/gmetric --name Resource_Usage_Rank 2 --value

Re: [Ganglia-general] ganglia-webfrontend

2003-01-27 Thread Steven Wagner
John Francis Lee wrote: Greetings, I've downloaded and installed the software to the machines in our internet cafe, have gmond running on all and gmetad on one. When I try to view the setup with the ganglia-webfrontend I get a lot of messages on the order of: Warning: ksort() expects

Re: [Ganglia-general] ganglia-webfrontend

2003-01-27 Thread Steven Wagner
John Francis Lee wrote: Thanks for the help! I followed you suggestions and attach the output of each telnet command. Both were able to connect, and the machine running gmond responded with data. Maybe there's something wrong with php? Take another look at the metadaemon's output: [DTD

[Ganglia-general] web front-end: the phantom job view

2003-01-21 Thread Steven Wagner
I noticed in CVS some comments about a job view, allowing for a user-specified graph start time and duration. However there doesn't appear to be any kind of interface for it. I'm not afraid of rolling my own (in fact, I think it might be fun to roll that into another application

Re: [Ganglia-general] How to setup multiple clusters using different multicast IP

2003-01-15 Thread Steven Wagner
clusters. The monitoring core probably shouldn't be running on the front-end. The metadaemon should be enough. I know in previous reply, Steven Wagner has said that this should work, but I am not able to get it to behave that way. Am I missing something very obvious ? You know, gentle

Re: [Ganglia-general] cross platform gmond clusters

2002-12-18 Thread Steven Wagner
Lester Vecsey wrote: Looking through the key_metrics.h file it seems that linux machines get a different set of keys from aix, and so on. Theres a basic core set of keys that are on all platforms, but then when it gets to things like pkts_in its only available for linux. In particular pkts_in

Re: [Ganglia-general] webfrontend, graph.php, v=value

2002-11-06 Thread Steven Wagner
Lester Vecsey wrote: I find it useful to select certain graphs and copy/paste the URL to some of the images to call them from my own html page, and I noticed that the graphs have a '(now )' value that is passed in with the v= arguement to graph.php. Certainly graph.php should be able to have

Re: [Ganglia-general] aix mem_free, 4.3

2002-10-29 Thread Steven Wagner
Lester Vecsey wrote: I was going to investigate this further to see exactly what kind of values the gmond process is coming up with in the relavent sections of code, but I thought I'd ask here. Also, does anyone know if ibm has a library for 4.3 for the vmgetinfo function? Its also mentioned in

Re: [Ganglia-general] webfrontend config question

2002-10-29 Thread Steven Wagner
Chris Stone wrote: Ganglia is great. I got it up and running on my linux cluster in short order. I do have one nagging detail I'd like to remedy. /var/lib/ganglia/rrds/ contains a directory called unspecified. My ganglia web page also lists this name as the name of the cluster, ie.

Re: [Ganglia-general] update rate

2002-10-28 Thread Steven Wagner
+ (a value between 120-150) } else do nothing /quotage Steven Wagner wrote: All those values are in seconds. The mcast_min/max values specify the range (randomly determined on each round of execution) of interval between TRANSMISSIONS of the metric. The other two values

Re: [Ganglia-general] question about ganglia

2002-10-25 Thread Steven Wagner
Adil Hasan wrote: Hello, I quickly took a look at Ganglia and it looks like a nice tool for monitoring some of our servers. However, I'd like to be able to run as a non root user. Is it possible to do this? Or, is there another tool that would be better suited for non-root users?

Re: [Ganglia-general] The Illuminati Order

2002-10-23 Thread Steven Wagner
The fnord content was too low to be from the REAL Illuminati. I suppose my fnord detector code might be broken, but I fed the front page of cnn.com through it and it went crazy so I'm pretty sure it's working... Doug fNordwall wrote: I admit, this was the first spam that I've actually found

Re: [Ganglia-general] Newbie error

2002-10-17 Thread Steven Wagner
[EMAIL PROTECTED] wrote: I get a lovely bit of code. It seems to be working. Depends on the length and breadth of the code. If it's displaying metrics, then it's working. If it just has the DTD and there's no real data (no CLUSTER or HOST tags), it ain't. Also, did you install it in

Re: [Ganglia-general] Newbie error

2002-10-16 Thread Steven Wagner
[EMAIL PROTECTED] wrote: Hi all, [points at Ben] HA-ha! OK, now that we've gotten the Nelson laugh out of the way... [not being a ROCKS guy, I defer on all these points to anyone who is *cough*fed*cough*] I just installed a ROCKS 2.21 cluster, which seemed to have ganglia 1.05 or

Re: [Ganglia-general] Ganglia 2.5.0 on Solaris 8

2002-10-08 Thread Steven Wagner
Andrew Gill wrote: I'm trying to get Ganglia to work on Solaris 8, and seem to be hitting my head against a wall. I can compile it without any problems, using gcc-3.2. However, the gmond binary exits immediately (return code 0) and no gmond process runs in the background. A 'truss' of gmond

Re: [Ganglia-general] update rate

2002-10-07 Thread Steven Wagner
[EMAIL PROTECTED] wrote: Orest. Does ganglia toolkits have posibilities to slow down database updating rate not 15 seconds but 30 (60 ) ? If you find metrics are updating too often, you can modify the values in $GANGLIA_SOURCE/gmond/metric.h (look for mcast_min and mcast_max). If you're

Re: [Ganglia-general] explanation of metrics

2002-10-03 Thread Steven Wagner
Matt once wondered (on the dev list) why I don't write documentation. So after a solid day of SCSI troubleshooting, I thought I'd, you know, contribute... --- Here are the metrics that are widely supported across different platforms (or, in a few cases, the ones we *wish* were supported

Re: [Ganglia-general] Web Front End Problems

2002-10-01 Thread Steven Wagner
This may or may not be it, but when I first set up the ganglia frontend, I needed to turn on register_globals in my php.ini file. The variables passed to the different scripts (notably graph.php) just weren't being accessed. Then again, that was the first release... this may have been fixed

Re: [Ganglia-general] Ganglia is not secure. (WOLF!)

2002-09-17 Thread Steven Wagner
Cripes, way to freak out the developers. I hope you never see The Adventures of Pluto Nash on an airplane, otherwise you might loudly declare that you just saw a bomb. :P This is normal behavior - 239.2.11.71 is a multicast address. Ganglia's entire metric transmission system is based

Re: [Ganglia-general] Ganglia is not secure. (WOLF!)

2002-09-17 Thread Steven Wagner
:[EMAIL PROTECTED] Behalf Of Steven Wagner Sent: Tuesday, September 17, 2002 3:15 PM To: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Ganglia is not secure. (WOLF!) Cripes, way to freak out the developers. I hope you never see The Adventures of Pluto Nash on an airplane

Re: [Ganglia-general] Ganglia is not secure. (WOLF!)

2002-09-17 Thread Steven Wagner
Jeffrey B. Layton wrote: At least you are thinking about security. You would be suprised how many people don't even think about it! Don't feel bad. Jeff I'd also like to add that the timing of this e-mail was *perfect* as we are readying a nice shiny new release and, if there WAS a major

Re: [Ganglia-general] Unusual behaviour of gmond 2.4.1

2002-08-28 Thread Steven Wagner
Try running the monitoring cores in debug mode (in the foreground) to see if they're receiving multicast packets from other hosts. You may need to increase your mcast_ttl value. Remember that all monitoring cores must use the same multicast address and port, otherwise they won't hear one

Re: [Ganglia-general] Problems with gstat

2002-08-28 Thread Steven Wagner
If memory serves me correctly, the heartbeat metric was not added until midway through our long CVS-only push from 2.4.1 to 2.5.0. Before this implementation, it was difficult to really be sure whether a node was down or had just randomly decided to wait more than 20-30 seconds to transmit a

Re: [Ganglia-general] Figured some stuff out for SuSE (was: libssl and libcrypto in SuSE openssl rpms)

2002-08-26 Thread Steven Wagner
HPC Mail Acct. wrote: Hi Matt + list, chorus of Hi, mailguy! :P One other small unrelated thing - From your documentation: If you want to monitor a node but do not want it to show up in the list of hosts returned by gmond for gexec use, simply start gmond on that node with the --no_gexec

Re: [Ganglia-general] high load with gmetad

2002-08-22 Thread Steven Wagner
Remember that RRD files are of a fixed size. In other words, they should never grow beyond their original size when created. That's why they call 'em round-robin databases. :) So the only reason new RRDs would be created is if new metrics were added for existing hosts or if new hosts were

Re: [Ganglia-general] high load with gmetad

2002-08-21 Thread Steven Wagner
markp wrote: Is anyone experiencing a high load with gmetad? I've run this daemon on a high end intel 933mhz dual proc machine with 1gb of memory and RH 7.2. Loads get and stay as high as 3. I get worse results on single processor machines, loads as high as 6.7 Kill the daemon and it drops

Re: [Ganglia-general] raising granularity of gmetad

2002-08-06 Thread Steven Wagner
Joe Kaiser wrote: Hi, I am interested in getting greater granularity on some of the metrics, especially over greater lengths of time. For example, if I wanted to see the one hour cpu load and how it changed over an hour/day/week and I wanted to have the same granularity at one week as I do at

Re: [Ganglia-general] gmetad source code problem

2002-07-25 Thread Steven Wagner
Martin Margo wrote: Dear Mr. Massie Sir, I am really sorry to bug you again this time. But I have finally sorted out all kind of problems and have finally getting closer to the problem. I execute the /sbin/gmetad script and viewed the /logs/gmetad.log file and in there it said User of

Re: [Ganglia-general] GMETAD problem

2002-07-24 Thread Steven Wagner
Martin Margo wrote: Hi Steven, thanks a lot for your help. I checked out the logs and restarted the daemon couple of times, and waited for 5-10 minutes. I took a look at the daemon logs and in it, it said Use of uninitialized value in hash element at ./gmetad line 109. over and over again to

Re: [Ganglia-general] Help with getting info out of ganglia

2002-07-15 Thread Steven Wagner
Yujun_Wu wrote: I am working on getting the monitoring info out of ganglia and put them into a grid-level monitoring tool. I find I can do this in three ways after browsing the ganglia documentation: 1. telnet remote.cluster.nodename 8649 2. gstat 3. through rrdb The first one (using

Re: [Ganglia-general] A gmetad question

2002-07-12 Thread Steven Wagner
Joe Griffin wrote: Hello, I have two clusters running ganglia/gmetad wonderfully. Each cluster has it's own name and gmetad seperates the clusters by those names (the headnode name). I have a third cluster which has two types of nodes within the same cluster (type1 and type2). But gmetad

Re: [Ganglia-general] perl and solaris

2002-07-11 Thread Steven Wagner
[EMAIL PROTECTED] wrote: I am trying to run gmond and gmetad for the first time, and I am having trouble getting it to work. I think the problem involves either the version of perl I am using or that I am trying to run it on Solaris. The machines I am trying to run the ganglia monitor on are

[Ganglia-general] [gmetad] spotty updates - solution :)

2002-07-03 Thread Steven Wagner
Well, I have no idea if this is an official solution but it sure as heck worked for me. I thought I'd share. Here's the problem I was having, in a nutshell: * Boxes in my Solaris cluster appeared to disappear and reappear between page views of gmetad-frontend. i.e., metacluster view says

Re: [Ganglia-general] slackware 8

2002-07-03 Thread Steven Wagner
Try adding debug_level 10 (or 100 - just greather than one) to your /etc/gmond.conf and start gmond again to see where it dies. Also, you *are* running it as superuser, right? It setuids itself but does seem to need to be started by root... Aaron Lott wrote: Has anyone had luck getting

Re: [Ganglia-general] What version of Linux Kernel...?

2002-07-01 Thread Steven Wagner
Ionescu Razvan-RIONESC1 wrote: Hi! Could anybody tell me what Linux kernel version is needed for running Gmond (and Gmetad)? Or what module are mandatory? I use a 2.4.5 kernel and didn't work, in fact I am able to get an XML, but without any information about nods. I worked with a 2.4.17

[Ganglia-general] [gmetad] Intermittent results reported?

2002-06-28 Thread Steven Wagner
Just wondering if anyone else has experienced problems with one cluster's metrics not being reported consistently in a gmetad multi-cluster setup. At the moment I have a (fairly homogenous) 30-node all-Linux cluster that reports very strongly (although for some reason cpu_num is reported as 1,

Re: [Ganglia-general] Delay between 2 multicast

2002-06-27 Thread Steven Wagner
Gonéri Le Bouder wrote: Le mer 26/06/2002 à 18:41, Steven Wagner a écrit : Gonéri Le Bouder wrote: Is it possible to increase the time betwen 2 multicast. Yes, but you need to edit the source and recompile gmond to do it. Open $TOP_DIR/gmond/metric.h and revise the values upwards

[Ganglia-general] gmond/solaris - ready for alpha

2002-06-12 Thread Steven Wagner
Good news, everyone! Most of the hardcore development I've been doing on solaris.c for ganglia-monitoring-core 2.3.1b1 (the last version to compile and execute for me on Solaris 8) is now finished. Since I'm monitoring a group of fileservers, I've also added some metrics. This means that,