forgot the list ... ----- Forwarded Message ----
> From: Martin Knoblauch <kn...@knobisoft.de> > To: Cameron Spitzer <cspit...@nvidia.com> > Sent: Wed, February 3, 2010 11:48:10 AM > Subject: Re: [Ganglia-general] any workaround for the bogus spikes problem? > > > > > >From: Cameron Spitzer > >To: kn...@knobisoft.de > >Cc: "ganglia-general@lists.sourceforge.net" > > >Sent: Tue, February 2, 2010 6:49:52 PM > >Subject: Re: [Ganglia-general] any workaround for the bogus spikes problem? > > > >> > > > > > > > >Martin Knoblauch wrote: > > > >We're trying to use Ganglia to monitor some HP DL580-G5 machines. > >>>We're using a 64-bit linux-2.6.16. > >>> > >>> > >>which version of Ganglia? > >> > >ganglia-3.1.2 > > > > > >The network traffic information is polluted with phantom 20 PB traffic > >>>spikes. > >>> > >>> > >I tried lowering the silliness threshold from 1e13 and 1e8 to 4.0e9 and > >3.0e6, > >>and I cranked the collect_every on that group from 40 (seconds?) to 5. > >>Now I get exabyte peaks instead of petabyte peaks. > > > > > > what kind of NIC do you have (1GB, 10 GB)? Which hardware and driver is > loaded? What is the average network throughput you see? > >> > >> > >It's the 1 Gbps NIC on the server motherboard, BCM5708 Rev 12. > >>dmesg says, Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.5.5b > >(January 31, 2007). > > > > BCM sounds familiar. Which distro are you using, which kernel? > > > > > > >I found an ifdef for REMOVE_BOGUS_SPIKES in libmetrics/linux/metrics.c > >>>Defining it has no effect. > > Maybe you can add some debugging output and check whether that stuff is > triggered at all. Maybe the thresholds are not good anymore. > >> > >Some hints about how to do that would help. I've tried adding > >err_msg() calls and > >>I can't find where the messages go. They're not in any of the syslog > >channels. > >>I don't understand the structure of libmetrics/linux/metrics.c well > >enough to guess > >>where it would make sense to open a new log file. > > > > If daemonized, messages go to syslog. If run in foreground, they go to stderr. > > Just try running the gmond with "-d 1" in foreground. You should already get > some output in the overflow case. > > > > > And btw. that code does not *remove* bogus spikes from the RRD database. It > just is supposed to prevent their generation. > >> > >I realize that. With each hack to libmetrics/linux/metrics.c, I've > >been stopping gmetad and removing all the > >>corrupted rrd files. I don't know how to edit an rrd file. > > > > > > The contrib directory in "trunk" has the actual "removespikes.pl" file from > the > RRD source repository. Useful for updating databases that you do not want to > throw away. > > > > >>Can anyone tell me the unit of measure which applies to l_bin and l_bout > >>>in that file? > >>>Is it bytes per second, bytes per collect_every, bytes per time_threshold? > >>> > >>> > >> Not completely sure. > >> > >It would be really great if the authors of libmetrics/linux/metrics.c > >would document it. > > > > Looking at the code, it is per second: > > /* > ** Compute timediff. Check for bogus delta-t > */ > float t = timediff(&proc_net_dev.last_read,&stamp); > if ( t < proc_net_dev.thresh) { > err_msg("update_ifdata(%s) - Dubious delta-t: %f",caller,t); > return; > } > stamp = proc_net_dev.last_read; > > /* > ** Compute rates in local variables > */ > l_bin = l_bytes_in / t; > l_bout = l_bytes_out / t; > l_pin = l_pkts_in / t; > l_pout = l_pkts_out / t; > > Cheers > Martin ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general