On 9/12/14 8:25 AM, daniel.j.marr...@us.hsbc.com wrote:
Greetings all and apologies if I have directed this to the incorrect
list but I thought it might be best to begin with those closest to the
source, so to speak.
I work for a fairly large international bank and we are currently
evaluating options for collecting and visualizing performance related
statistics for the entirety of our UNIX/Linux estate (with the
possibility of including Windows at some point). Naturally I took to
the internet and came across Ganglia as one of the (widely used)
possible options. I then spent some time looking through reports of
issues, etc. and have some questions/concerns regarding how best to
organize my infrastructure should I decide to recommend Ganglia as the
solution.
In preparation I thought it best to do some discovery around the size
of our estate and any details our end users (system administrators,
performance engineers, etc) would say needed to be included in the
metric set. To that end I would say that we have approximately 26K
servers today and, given rough extrapolation, could easily wind up in
the neighborhood of 4.5M total metrics within the total system. Our
expectation is to extend the base set of metrics to include any number
of middleware related measurements which is the primary reason for the
significant number of metrics. We will also be using unicast ..
unless, of course, a compelling enough case can be made for the
alternative.
My initial instincts are to subdivide the Ganglia infrastructure by
major data-center with each one represented by a single grid. I
imagine I would need 6-12 clusters (possibly more) per grid and will
definitely be looking to use rrdcached. I do not know if that will be
enough segregation to allow gmetad to perform as required. Several of
my larger (more influential) end users have indicated a need for some
fairly tight resolutions (15s for 4hrs for a number of high value
metrics).
I guess my initial question is this ... has anyone done anything like
this, at this scale, with any success and - if so - would it be
possible to get some additional information (scrubbed diagram, etc)
regarding how it is best done? I've been searching the net and keep
coming back to a single image showing a hierarchy of gmetad and some
fairly interesting descriptions of other implementations but nothing
that actually makes it clear to me.
Hi Daniel,
If you haven't already seen it, I would recommend checking out the
Ganglia O'Reilly book
(http://shop.oreilly.com/product/0636920025573.do). It has several case
studies from large and complex organizations, complete with scale
information and some benchmarking, that might help aid your planning.
- Adam Compton
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers