On Fri, Nov 13, 2015 at 4:50 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Also, what GC settings are you using? We may be able to make some suggestions. > > Cumulative GC pauses aren’t very interesting to me. I’m more interested in > the longest ones, 90th percentile, 95th, etc. >
Any advice would be great, but what I'm primarily interested in is how people are monitoring these statistics in real time, for all time, on production servers. Eg, for looking at the disk or RAM usage of one of my servers, I can look at the historical usage in the last week, last month, last year and so on. I need to get these stats in to the same monitoring tools as we use for monitoring every other vital aspect of our servers. Looking at log files can be useful, but I don't want to keep arbitrarily large log files on our servers, nor extract data from them, I want to record it for posterity in one system that understands sampling. We already use and maintain our own munin systems, so I'm not interested in paid-for equivalents of munin - regardless of how simple to set up they are, they don't integrate with our other performance monitoring stats, and I would never get budget anyway. So really: 1) Is it OK to turn JMX monitoring on on production systems? The comments in solr.in.sh suggest not. 2) What JMX beans and attributes should I be using to monitor GC pauses, particularly maximum length of a single pause in a period, and the total length of pauses in that period? Cheers Tom