[Ganglia-developers] patches for: [Sec] Gmetad server BoF and network overload + [Feature] multiple requests per conn on interactive port

2009-01-13 Thread Spike Spiegel
Hi, I wanted to add a feature to gmetad so that it was possible to request multiple items per connection on the interactive port, and while doing so I uncovered a buffer overflow that crashes the gmetad server. I'm releasing both patches together because they are somewhat connected. Apologies if t

Re: [Ganglia-developers] patches for: [Sec] Gmetad server BoF andnetwork overload + [Feature] multiple requests per conn oninteractive port

2009-01-15 Thread Spike Spiegel
On Fri, Jan 16, 2009 at 7:04 AM, Kostas Georgiou wrote: > On Thu, Jan 15, 2009 at 01:41:53PM -0700, Brad Nicholes wrote: > >> >>> On 1/15/2009 at 8:56 AM, in message >> <496efa2a02ac0003a...@lucius.provo.novell.com>, "Brad Nicholes" >> wrote: >> >> After taking a little closer look at the pat

Re: [Ganglia-developers] Possible REST interface to the interactive port?

2009-01-17 Thread Spike Spiegel
Hi, On Sat, Jan 17, 2009 at 5:04 AM, john allspaw wrote: > > Hey all - > > Wondering if there's ever been any talk about serving up the interactive port > info via REST? I am kinda working on this already although not in the form of a ganglia patch, but as an external application that pulls dat

Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-17 Thread Spike Spiegel
On Sat, Jan 17, 2009 at 12:03 AM, Brad Nicholes wrote: > I have updated the bug #223 with a consolidated patch. Please review and > provide feedback for backporting to the 3.1 branch. Once we get the patched > reviewed and backported to 3.1, let's move forward with the release of 3.1.2. > > +1

Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Spike Spiegel
On Sun, Jan 18, 2009 at 7:35 PM, Carlo Marcelo Arenas Belon wrote: >> other than that looks good to me. > > could you check the "simplified" one?, this problem was introduced in > 2003 and therefore affects all versions of ganglia since then (including > 2.5.7 which is not supported anymore and th

Re: [Ganglia-developers] patches for: [Sec] Gmetad server BoF and network overload + [Feature] multiple requests per conn on interactive port

2009-01-18 Thread Spike Spiegel
On Sun, Jan 18, 2009 at 9:08 PM, Carlo Marcelo Arenas Belon wrote: > the proposed solution will result in a truncated XML which then will fail to > be parsed in the client and in an obscure error like "unable to write > XML tree info". > > agree that returning the whole tree isn't the best way to

Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Spike Spiegel
On Sun, Jan 18, 2009 at 10:27 PM, Carlo Marcelo Arenas Belon wrote: > http://bugzilla.ganglia.info/cgi-bin/bugzilla/attachment.cgi?id=188&action=view > that should apply cleanly to 3.1.1, 3.0.7 and 2.5.7 Looks fine to me, altho I'd argue that without the other changes supporting the multi-patch b

Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Spike Spiegel
On Mon, Jan 19, 2009 at 5:44 AM, Carlo Marcelo Arenas Belon wrote: > agree, but that is to be done in the context of getting "multi-patch" > committed and backported, but not in fixing this buffer overflow in the > interactive port, which is what BUG223 is about. ok, guess I'll start a different

Re: [Ganglia-developers] Possible REST interface to the interactiveport?

2009-01-21 Thread Spike Spiegel
On Wed, Jan 21, 2009 at 2:52 AM, Brad Nicholes wrote: > Yep, I was also thinking that a RESTful output module for gmetad-python would > probably be the easiest solution I haven't used gmetad-python yet so one concern would be performances and how it'd behave having to aggregate and serve a lot o

[Ganglia-developers] gmetad protocol and propagating errors back to the client

2009-01-21 Thread Spike Spiegel
Hi, right now when gmetad fails an error is logged and in some cases the connection to the client interrupted returning invalid XML or in other cases (item not found or broken request) the entire tree is returned. This imho is bad behavior and code should be added to inform the client of the error

Re: [Ganglia-developers] gmetad protocol and propagating errors back to the client

2009-01-23 Thread Spike Spiegel
On Thu, Jan 22, 2009 at 6:55 PM, Carlo Marcelo Arenas Belon wrote: > the interactive port was designed to mimic the behaviour from the > original gmetad port which always returns the whole tree. why's that? if I wanted the whole tree I'd query the non interactive port, instead I'm asking for spec

Re: [Ganglia-developers] CVE

2009-01-23 Thread Spike Spiegel
On Fri, Jan 23, 2009 at 11:52 PM, Brad Nicholes wrote: >>> * http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-0242 >>> >>> "Ganglia 3.1.1 allows remote attackers to cause a denial of service via >>> a request to the gmetad service with a path does not exist, which causes >>> Ganglia to (

Re: [Ganglia-developers] gmond python module interface

2009-01-31 Thread Spike Spiegel
Hi, provided that I haven't had the time to look at this part of the code yet and that I agree it would be much nicer to have a gmetric-like behavior, On Sun, Feb 1, 2009 at 12:21 AM, David Stainton wrote: > I like using gmetric to monitor... so I wrote gmetric-daemon which > is my attempt at a

[Ganglia-developers] Thoughts on host spoofing

2009-02-04 Thread Spike Spiegel
Hi, while looking at taking advantage of host spoofing a few things came up that I'd like to discuss. Assuming that host spoofing can be used for: a) overriding host names because using PTRs can be a PITA depending on the infrastructure (for example if you have tun/ethX prefixes) b) streamlining

Re: [Ganglia-developers] Thoughts on host spoofing

2009-02-05 Thread Spike Spiegel
Hi, thanks for the clarification, On Thu, Feb 5, 2009 at 11:44 PM, Brad Nicholes wrote: > Anything else the spoofing module needs to do beyond this > it completely up to the module developer. Any or all of what you > described below should just be module implementation details. Ok, so given t

Re: [Ganglia-developers] Thoughts on host spoofing

2009-02-06 Thread Spike Spiegel
On Fri, Feb 6, 2009 at 2:52 PM, Rick Cobb wrote: > My thought > is that the fewer underlying services a monitoring system needs to work, the > more likely it is to work. Absolutely, but dns itself is actually a good example of how introducing a dependency was necessary to make "a service" usable.

[Ganglia-developers] metric loss and send channel failures in a multi-channel setup

2009-08-17 Thread Spike Spiegel
Hi, we have a setup with 2 unicast channels and we recently ran across an issue where we lost a bunch of metrics submitted with gmetric due to a problem with dns that made one of the two channels unreachable. I traced this back to libgmond.c and Ganglia_udp_send_channels_create(...) where the code

Re: [Ganglia-developers] metric loss and send channel failures in a multi-channel setup

2009-08-22 Thread Spike Spiegel
On Mon, Aug 17, 2009 at 7:56 PM, Spike Spiegel wrote: > thanks for your input, I've given this a go and there's a patch attached to this email that I'd like to hear comments about. I've never used apr before, but based on the documentation [1] apr_array_push will allocat

[Ganglia-developers] RRD_update illegal attempt to update using time 1252671437 when last update time is 1252671437 (minimum one second step)

2009-09-11 Thread Spike Spiegel
Hi, our gmetad boxes (2 of them) with 12 data sources, 6 of which are gmetad and 6 gmonds, are spamming syslog like mad with the following message: Sep 6 06:33:32 localhost.localdomain /usr/sbin/gmetad[2526]: RRD_update (/var/lib/ganglia/rrds/...metric.rrd): illegal attempt to update using time

[Ganglia-developers] gmetad spamming logs with "unable to write root epilog"

2009-09-11 Thread Spike Spiegel
Hi, recently we added better monitoring for our ganglia infrastructure and one of the checks for gmetad contacts it on port 8651, looks for some XML string and exits (receiving 20+ MBs of xml every time we run the check isn't an option). The 'exists' part means sending a RST before gmetad has sent

Re: [Ganglia-developers] Fwd: [Ganglia-general] Another interface for Ganglia stats

2009-09-17 Thread Spike Spiegel
On Fri, Sep 18, 2009 at 8:32 AM, Bernard Li wrote: > Forwarding this to ganglia-developers since this is a more -devel > related discussion.  Also can get spike's opinions in ;-) remember that you asked for it :P > On Wed, Sep 16, 2009 at 11:49 AM, Vladimir Vuksan wrote: >> There have been some

Re: [Ganglia-developers] Another interface for Ganglia stats

2009-09-26 Thread Spike Spiegel
On Tue, Sep 22, 2009 at 9:05 AM, Vladimir Vuksan wrote: > I guess a lot of the conversation depends on what you want and expect > Ganglia to be used for. For example there are a lot of people out there > that are using Ganglia for performance monitoring and using Nagios NRPE > to get user level st

Re: [Ganglia-developers] Feeble attempt at gmond aliasing

2009-10-09 Thread Spike Spiegel
On Fri, Oct 2, 2009 at 9:59 PM, Jesse Becker wrote: > On Fri, Oct 2, 2009 at 10:35, Brad Nicholes wrote: >> How well does this fit into the previous discussions of using a GUID to >> identify a box rather than an IP or FQDN?  Are aliasing and GUID identifiers >> related or are they two separate

Re: [Ganglia-developers] Feeble attempt at gmond aliasing

2009-10-09 Thread Spike Spiegel
On Fri, Oct 9, 2009 at 9:48 PM, Jesse Becker wrote: > The "GUID discussion" I refered to was if gmond/gmetad should be > rewritten, top-to-bottom, to use GUIDs instead of relying on DNS/IP > addresses.  My understanding is that everything would have use them, > including the .rrd files underneath.

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-05 Thread Spike Spiegel
On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock wrote: > One problem I've been wondering about recently is the scalability of > gmetad/rrdtool. [cut] > In a particularly large organisation, moving around the RRD files as > clusters grow could become quite a chore.  Is anyone putting their RRD > f

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-13 Thread Spike Spiegel
On Fri, Dec 11, 2009 at 1:34 PM, Daniel Pocock wrote: > Thanks for sharing this - could you comment on the total number of RRDs per > gmetad, and do you use rrdcached? the largest colo has 140175 rrds and we use the tmpfs + cron hack, no rrdcached. > I was thinking about gmetads attached to the

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Spike Spiegel
On Mon, Dec 14, 2009 at 2:00 AM, Vladimir Vuksan wrote: > I think you guys are complicating much :-). Can't you simply have multiple > gmetads in different sites poll a single gmond. That way if one gmetad fails > data is still available and updated on the other gmetads. That is what we > used to

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Spike Spiegel
On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon wrote: >> a) you are only concerned with redundancy and not looking for >> scalability - when I say scalability, I refer to the idea of maybe 3 or >> more gmetads running in parallel collecting data from huge numbers of agents > > what i

Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-21 Thread Spike Spiegel
On Sun, Dec 20, 2009 at 7:35 PM, Vladimir Vuksan wrote: > If you lose a day or > two or even a week of trending data that is not gonna be disaster as long > as that data is present somewhere else. sure, but where? how would the ganglia frontend tell? > Thus I proposed a simple solution > where e

Re: [Ganglia-developers] How do we deal with very large clusters in the webui

2011-03-07 Thread Spike Spiegel
Hi, On Thu, Mar 3, 2011 at 11:11 PM, Jim Greene wrote: > -Don't show any individual hosts, only the aggregate and the > load/network/etc levels for the whole cluster we did this on the main page for grids by adding one line of php that excluded the bulk of our computing grid. We also added a "r