@Dustin, @Ben Do you not need to increase the heap size to correspond to the replica num * the partition count? So that partition records can be held in memory until they are sent to the replicas? I believe @Ben's kafka setup is such that there are thousands of partitions across the topics.
On Thu, Jun 9, 2016 at 1:22 PM Dustin Cote <dus...@confluent.io> wrote: > @Ben, the big GC stalls could be related to the 16GB max heap size. When > you have a bigger heap size, you need more time to GC if/when you hit a > garbage collection. In general, Kafka shouldn't need more than a 5GB heap, > and lowering your heap size combined with using the G1GC (and preferably > Java 8) should help you have better GC performance. > > @Lawrence, you probably want to raise your max heap size up from 256M to 1G > (KAFKA_HEAP_OPTS="-Xmx1G") and see how it goes. The total memory used on > the system might be 43% but the heap usage is a different measurement. > You'll see an out of memory error if you push the JVM heap size beyond 256M > in your case even if the system has available memory. > > On Thu, Jun 9, 2016 at 1:08 PM, Stephen Powis <spo...@salesforce.com> > wrote: > > > Hey Ben > > > > Using G1 with those settings appears to be working well for us. > Infrequent > > younggen/minor GCs averaging a run time of 12ms, no full GCs in the 24 > > hours logged that I uploaded. I'd say enable the GC log flags and let it > > run for a bit, then change a setting or two and compare. > > > > > > > > On Thu, Jun 9, 2016 at 3:59 PM, Ben Osheroff <b...@zendesk.com.invalid> > > wrote: > > > > > We've been having quite a few symptoms that appear to be big GC stalls > > > (nonsensical ZK session timeouts) with the following config: > > > > > > -Xmx16g > > > -Xms16g > > > -server > > > -XX:+CMSClassUnloadingEnabled > > > -XX:+CMSScavengeBeforeRemark > > > -XX:+UseG1GC > > > -XX:+DisableExplicitGC > > > > > > Next steps will be to turn on gc logging and try to confirm that the ZK > > > session timeouts are indeed GC pauses (they look like major > > > collections), but meanwhile, does anyone have experience around whether > > > these options (taken from https://kafka.apache.org/081/ops.html) > helped? > > > Would prefer to not just blindly turn on options if possible. > > > > > > -XX:PermSize=48m > > > -XX:MaxPermSize=48m > > > -XX:MaxGCPauseMillis=20 > > > -XX:InitiatingHeapOccupancyPercent=35 > > > > > > Thanks! > > > Ben Osheroff > > > Zendesk.com > > > > > > On Thu, Jun 09, 2016 at 03:52:41PM -0400, Stephen Powis wrote: > > > > NOTE -- GC tuning is outside the realm of my expertise by all means, > so > > > I'm > > > > not sure I'd use our info as any kind of benchmark. > > > > > > > > But in the interest of sharing, we use the following options > > > > > > > > export KAFKA_HEAP_OPTS="-Xmx12G -Xms12G" > > > > > > > > > > export KAFKA_JVM_PERFORMANCE_OPTS="-server -Djava.awt.headless=true > > > > > -XX:MaxPermSize=48M -verbose:gc -Xloggc:/var/log/kafka/gc.log > > > > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails > > > -XX:+PrintTenuringDistribution > > > > > -XX:+PrintGCApplicationStoppedTime -XX:+PrintTLAB > > > -XX:+DisableExplicitGC > > > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 > > > -XX:GCLogFileSize=100M > > > > > -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC > > > > > -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError > > > > > -XX:HeapDumpPath=/var/log/kafka/heapDump.log" > > > > > > > > > > > > > You can then take your gc.log files and use an analyzer tool...I've > > > > attached a link to one of our brokers gclog run thru gceasy.io. > > > > > > > > https://protect-us.mimecast.com/s/wXqqBJuqdZb1Tn > > > > > > > > On Thu, Jun 9, 2016 at 3:39 PM, Lawrence Weikum <lwei...@pandora.com > > > > > wrote: > > > > > > > > > Hi Tom, > > > > > > > > > > Currently we’re using the default settings – no special tuning > > > > > whatsoever. I think the kafka-run-class.sh has this: > > > > > > > > > > > > > > > # Memory options > > > > > if [ -z "$KAFKA_HEAP_OPTS" ]; then > > > > > KAFKA_HEAP_OPTS="-Xmx256M" > > > > > fi > > > > > > > > > > # JVM performance options > > > > > if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then > > > > > KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC > > > -XX:MaxGCPauseMillis=20 > > > > > -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC > > > > > -Djava.awt.headless=true" > > > > > fi > > > > > > > > > > > > > > > Is this the confluent doc you were referring to? > > > > > https://protect-us.mimecast.com/s/arXXBOspkvORCD > > > > > > > > > > Thanks! > > > > > > > > > > Lawrence Weikum > > > > > > > > > > > > > > > On 6/9/16, 1:32 PM, "Tom Crayford" <tcrayf...@heroku.com> wrote: > > > > > > > > > > >Hi Lawrence, > > > > > > > > > > > >What JVM options were you using? There's a few pages in the > > confluent > > > docs > > > > > >on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and > > things > > > > > work > > > > > >well (running many thousands of clusters). > > > > > > > > > > > >Thanks > > > > > >Tom Crayford > > > > > >Heroku Kafka > > > > > > > > > > > >On Thursday, 9 June 2016, Lawrence Weikum <lwei...@pandora.com> > > > wrote: > > > > > > > > > > > >> Hello all, > > > > > >> > > > > > >> We’ve been running a benchmark test on a Kafka cluster of ours > > > running > > > > > >> 0.9.0.1 – slamming it with messages to see when/if things might > > > break. > > > > > >> During our test, we caused two brokers to throw OutOfMemory > errors > > > > > (looks > > > > > >> like from the Heap) even though each machine still has 43% of > the > > > total > > > > > >> memory unused. > > > > > >> > > > > > >> I’m curious what JVM optimizations are recommended for Kafka > > > brokers? > > > > > Or > > > > > >> if there aren’t any that are recommended, what are some > > > optimizations > > > > > >> others are using to keep the brokers running smoothly? > > > > > >> > > > > > >> Best, > > > > > >> > > > > > >> Lawrence Weikum > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > -- > Dustin Cote > confluent.io >