Hi folks, I'm a bit new to the operational side of G1, but pretty familiar with its basic concept. We recently set up a Kafka cluster to support a new product, and are seeing some suboptimal GC performance. We're using the parameters suggested in the docs, except for having switched to java 1.8_40 in order to get better memory debugging. Even though the cluster is handling only 2-3k messages per second per node, we see periodic 11-18 second stop-the-world pauses on a roughly hourly cadence. I've turned on additional GC logging, and see no humongous allocations, it all seems to be buffers making it into the tenured gen. They appear to be collectable, as the collection triggered by dumping the heap collects them all. Ideas for additional diagnosis or tuning very welcome.
--Cory