Yep, that's exactly what the output is showing. On Fri, Jan 13, 2017 at 11:25 AM, Tauzell, Dave < dave.tauz...@surescripts.com> wrote:
> I haven't used dtrace, but is it possible to have it running and recording > the ftruncate64 times? Then when you see one of these long roll times look > at the dtrace log to see if it was that call? > > -Dave > > -----Original Message----- > From: Stephen Powis [mailto:spo...@salesforce.com] > Sent: Friday, January 13, 2017 9:25 AM > To: users@kafka.apache.org > Subject: Re: Taking a long time to roll a new log segment (~1 min) > > So the underlying system call is ftruncate64, logged using dtrace. > > # BEGIN stack trace for ftruncate, call took 34160541200ns: > > args== > > 0x7f5f9a1134d7 : ftruncate64+0x7/0x30 [/lib64/libc-2.12.so] > > 0x7f5f987b80c3 : 0x7f5f987b80c3 > > [/usr/lib/jvm/jdk1.8.0_51/jre/lib/amd64/libjava.so+0x180c3/0x22c000] > > # END stack trace for ftruncate > > > > These index files seem to be typically around 10MB for open log segments, > and get truncated down to ~1MB after being closed. I'd imagine truncating > off ~9MB shouldn't be that heavy of an operation. > > ZFS options: > > > data/kafka type filesystem - > > data/kafka creation Tue Feb 9 22:45 2016 - > > data/kafka used 196G - > > data/kafka available 6.65T - > > data/kafka referenced 196G - > > data/kafka compressratio 4.59x - > > data/kafka mounted yes - > > data/kafka quota none default > > data/kafka reservation none default > > data/kafka recordsize 128K default > > data/kafka mountpoint /kafka local > > data/kafka sharenfs off default > > data/kafka checksum on default > > data/kafka compression lz4 inherited > > from data > > data/kafka atime off inherited > > from data > > data/kafka devices on default > > data/kafka exec on default > > data/kafka setuid on default > > data/kafka readonly off default > > data/kafka zoned off default > > data/kafka snapdir hidden default > > data/kafka aclinherit restricted default > > data/kafka canmount on default > > data/kafka xattr on default > > data/kafka copies 1 default > > data/kafka version 5 - > > data/kafka utf8only off - > > data/kafka normalization none - > > data/kafka casesensitivity sensitive - > > data/kafka vscan off default > > data/kafka nbmand off default > > data/kafka sharesmb off default > > data/kafka refquota none default > > data/kafka refreservation none default > > data/kafka primarycache all default > > data/kafka secondarycache all default > > data/kafka usedbysnapshots 0 - > > data/kafka usedbydataset 196G - > > data/kafka usedbychildren 0 - > > data/kafka usedbyrefreservation 0 - > > data/kafka logbias throughput local > > data/kafka dedup off default > > data/kafka mlslabel none default > > data/kafka sync standard default > > data/kafka refcompressratio 4.59x - > > data/kafka written 196G - > > data/kafka logicalused 900G - > > data/kafka logicalreferenced 900G - > > data/kafka filesystem_limit none default > > data/kafka snapshot_limit none default > > data/kafka filesystem_count none default > > data/kafka snapshot_count none default > > data/kafka snapdev hidden default > > data/kafka acltype off default > > data/kafka context none default > > data/kafka fscontext none default > > data/kafka defcontext none default > > data/kafka rootcontext none default > > data/kafka relatime off default > > data/kafka redundant_metadata all default > > data/kafka overlay off default > > > > CPU IO Wait is extremely low (below 0.01%) with no recorded spikes. > > On Fri, Jan 13, 2017 at 6:57 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > Hi Stephen, > > > > I think it's worth capturing this in a JIRA so that it doesn't get lost. > > > > Ismael > > > > On Thu, Jan 12, 2017 at 7:22 PM, Stephen Powis <spo...@salesforce.com> > > wrote: > > > > > I've further narrowed it down to this particular line: > > > https://github.com/apache/kafka/blob/0.10.0/core/src/ > > main/scala/kafka/log/ > > > OffsetIndex.scala#L294 > > > > > > But I'm still at a loss to why this would be slow sometimes and not > > others. > > > > > > On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis > > > <spo...@salesforce.com> > > > wrote: > > > > > > > Using the little bash script in that JIRA ticket to go thru the GC > > > > log > > > and > > > > sum up the total pause times I come up with the following. I > > > > don't see anything that would indicate a ~28 second pause. > > > > > > > > 2017-01-12T07:00 = 72.2961 > > > >> 2017-01-12T07:01 = 59.8112 > > > >> 2017-01-12T07:02 = 69.6915 > > > >> 2017-01-12T07:03 = 74.4093 > > > >> 2017-01-12T07:04 = 73.507 > > > >> 2017-01-12T07:05 = 73.7028 > > > >> 2017-01-12T07:06 = 72.3392 > > > >> 2017-01-12T07:07 = 70.502 > > > >> 2017-01-12T07:08 = 71.7471 > > > >> 2017-01-12T07:09 = 76.8371 > > > >> 2017-01-12T07:10 = 72.3836 > > > >> 2017-01-12T07:11 = 74.7462 > > > >> 2017-01-12T07:12 = 57.1328 > > > >> 2017-01-12T07:13 = 80.005 > > > >> 2017-01-12T07:14 = 76.0012 > > > >> 2017-01-12T07:15 = 79.2221 > > > >> 2017-01-12T07:16 = 57.5385 > > > >> 2017-01-12T07:17 = 58.1185 > > > >> 2017-01-12T07:18 = 72.0524 > > > >> 2017-01-12T07:19 = 110.6736 > > > >> 2017-01-12T07:20 = 64.9808 > > > >> 2017-01-12T07:21 = 66.8767 > > > >> 2017-01-12T07:22 = 32.409 > > > >> 2017-01-12T07:23 = 85.3671 > > > >> 2017-01-12T07:24 = 123.9212 > > > >> 2017-01-12T07:25 = 70.3219 > > > >> > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis > > > > <spo...@salesforce.com > > > > > > > wrote: > > > > > > > >> Just realized that GCEasy doesn't keep reports around for very > > > >> long anymore, here is a screencap of the report: > > > >> http://imgur.com/a/MEubD > > > >> > > > >> The longest reported GC pause was 30ms, tho they happen somewhat > > > >> frequently at an average of once per 12 seconds. KAFKA-4616 > > > >> certainly sounds just like my scenario, but I'm having trouble > > > >> finding the long > > GC > > > >> pauses described in that ticket in my own logs. > > > >> > > > >> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao <xiaotao...@gmail.com> > > > wrote: > > > >> > > > >>> You may run into this bug https://issues.apache.org/jira > > > >>> /browse/KAFKA-4614 > > > >>> > > > >>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis > > > >>> <spo...@salesforce.com> > > > >>> wrote: > > > >>> > > > >>> > Per my email to the list in Sept, when I reviewed GC logs > > > >>> > then, I > > > >>> didn't > > > >>> > see anything out of the ordinary. ( > > > >>> > > > > >>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609. > > > >>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ% > > > >>> 40mail.gmail.com%3E > > > >>> > ) > > > >>> > > > > >>> > Reviewing the GC logs from this morning around the timestamp > > > >>> > in my > > > >>> previous > > > >>> > email, I see GC running roughly every 10-12 seconds, with > > > >>> > total > > times > > > >>> > similar to the following: > > > >>> > > > > >>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which > > > >>> application > > > >>> > > threads were stopped: 0.0141281 seconds, Stopping threads took: > > > >>> 0.0002171 > > > >>> > > seconds > > > >>> > > > > > >>> > > > > > >>> > Here is a link to a GCEasy report: > > > >>> > > > > >>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv > > > >>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk= > > > >>> > > > > >>> > > > > >>> > Currently using G1 gc with the following settings: > > > >>> > > > > >>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc > > > >>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps > > > >>> -XX:+PrintGCDetails > > > >>> > -XX:+PrintTenuringDistribution > > > >>> > -XX:+PrintGCApplicationStoppedTime > > > >>> > -XX:+PrintTLAB -XX:+DisableExplicitGC > > > >>> > -XX:+UseGCLogFileRotation > > > >>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M > > > -XX:+UseCompressedOops > > > >>> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > > > >>> > -XX:+HeapDumpOnOutOfMemoryError > > > >>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log > > > >>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log > > > -verbose:gc > > > >>> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps > > > >>> > -XX:+PrintGCTimeStamps > > > >>> > > > > >>> > > > > >>> > > > > >>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave < > > > >>> > dave.tauz...@surescripts.com > > > >>> > > wrote: > > > >>> > > > > >>> > > Can you collect garbage collection stats and verify there > > > >>> > > isn't a > > > >>> long GC > > > >>> > > happening at the same time? > > > >>> > > > > > >>> > > -Dave > > > >>> > > > > > >>> > > -----Original Message----- > > > >>> > > From: Stephen Powis [mailto:spo...@salesforce.com] > > > >>> > > Sent: Thursday, January 12, 2017 8:34 AM > > > >>> > > To: users@kafka.apache.org > > > >>> > > Subject: Re: Taking a long time to roll a new log segment > > > >>> > > (~1 > > min) > > > >>> > > > > > >>> > > So per the kafka docs I up'd our FD limit to 100k, and we > > > >>> > > are no > > > >>> longer > > > >>> > > seeing the process die, which is good. > > > >>> > > > > > >>> > > Unfortunately we're still seeing very high log segment roll > > times, > > > >>> and > > > >>> > I'm > > > >>> > > unsure if this is considered 'normal', as it tends to block > > > producers > > > >>> > > during this period. > > > >>> > > > > > >>> > > We are running kafka 0.10.0.1, but I patched in some > > > >>> > > additionally > > > >>> timing > > > >>> > > statements into the kafka.log.log roll() method to narrow > > > >>> > > down > > > >>> exactly > > > >>> > > which part of that method is taking so long. > > > >>> > > > > > >>> > > Again, typically the process to roll a new log file takes > > > >>> > > only > > > 1-2ms > > > >>> > tops, > > > >>> > > but several times a day it takes 30-60+ seconds, across all > > > >>> > > of > > our > > > >>> > > brokers. I've narrowed it down to this bit of code causing > > > >>> > > the > > > >>> issue: > > > >>> > > https://github.com/apache/kafka/blob/0.10.0/core/src/ > > > >>> > > main/scala/kafka/log/Log.scala#L652-L658 > > > >>> > > > > > >>> > > Here's an example of output w/ my additional timing log > > statements: > > > >>> > > > > > >>> > > [2017-01-12 07:17:58,199] INFO Rolled new log segment for > > > >>> 'MyTopic-4' in > > > >>> > > > 28028 ms. (kafka.log.Log) > > > >>> > > > > > >>> > > [2017-01-12 07:17:58,200] INFO LastEntry match time > > 'EventStream-4' > > > >>> in > > > >>> > > > 28027 ms. (kafka.log.Log) > > > >>> > > > > > > >>> > > > > > >>> > > Any thoughts? Help? This considered normal? > > > >>> > > > > > >>> > > Thanks > > > >>> > > Stephen > > > >>> > > > > > >>> > > > > > >>> > > On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava < > > > >>> > e...@confluent.io> > > > >>> > > wrote: > > > >>> > > > > > >>> > > > I can't speak to the exact details of why fds would be > > > >>> > > > kept > > open > > > >>> > > > longer in that specific case, but are you aware that the > > > >>> > > > recommendation for production clusters for open fd limits > > > >>> > > > is > > much > > > >>> > > > higher? It's been suggested to be 100,000 as a starting > > > >>> > > > point > > for > > > >>> quite > > > >>> > > awhile: > > > >>> > > > http://kafka.apache.org/documentation.html#os > > > >>> > > > > > > >>> > > > -Ewen > > > >>> > > > > > > >>> > > > On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis < > > > >>> spo...@salesforce.com> > > > >>> > > > wrote: > > > >>> > > > > > > >>> > > > > Hey! > > > >>> > > > > > > > >>> > > > > I've run into something concerning in our production > > > cluster....I > > > >>> > > > > believe I've posted this question to the mailing list > > > previously > > > >>> ( > > > >>> > > > > http://mail-archives.apache.org/mod_mbox/kafka-users/ > 201609. > > > >>> mbox/bro > > > >>> > > > > wser > > > >>> > > > ) > > > >>> > > > > but the problem has become considerably more serious. > > > >>> > > > > > > > >>> > > > > We've been fighting issues where Kafka 0.10.0.1 hits its > > > >>> > > > > max > > > file > > > >>> > > > > descriptor limit. Our limit is set to ~16k, and under > > > >>> > > > > normal operation > > > >>> > > > it > > > >>> > > > > holds steady around 4k open files. > > > >>> > > > > > > > >>> > > > > But occasionally Kafka will roll a new log segment, > > > >>> > > > > which > > > >>> typically > > > >>> > > > > takes on the order of magnitude of a few milliseconds. > > > >>> > > > > However...sometimes it will take a considerable amount > > > >>> > > > > of > > time, > > > >>> any > > > >>> > > > > where from 40 seconds up to over a minute. When this > > happens, > > > it > > > >>> > > > > seems like connections are not released by kafka, and we > > > >>> > > > > end > > up > > > >>> with > > > >>> > > > > thousands of client connections > > > >>> > > > stuck > > > >>> > > > > in CLOSE_WAIT, which pile up and exceed our max file > > descriptor > > > >>> > limit. > > > >>> > > > > This happens all in the span of about a minute. > > > >>> > > > > > > > >>> > > > > Our logs look like this: > > > >>> > > > > > > > >>> > > > > [2017-01-08 01:10:17,117] INFO Rolled new log segment > > > >>> > > > > for 'MyTopic-8' in > > > >>> > > > > > 41122 ms. (kafka.log.Log) > > > >>> > > > > > [2017-01-08 01:10:32,550] INFO Rolled new log segment > > > >>> > > > > > for > > > >>> > 'MyTopic-4' > > > >>> > > > in > > > >>> > > > > 1 > > > >>> > > > > > ms. (kafka.log.Log) > > > >>> > > > > > [2017-01-08 01:11:10,039] INFO [Group Metadata Manager > > > >>> > > > > > on > > > >>> Broker > > > >>> > 4]: > > > >>> > > > > > Removed 0 expired offsets in 0 milliseconds. > > > >>> > > > > > (kafka.coordinator.GroupMetadataManager) > > > >>> > > > > > [2017-01-08 01:19:02,877] ERROR Error while accepting > > > >>> connection > > > >>> > > > > > (kafka.network.Acceptor) > > > >>> > > > > > java.io.IOException: Too many open files at > > > >>> > > > > > sun.nio.ch.ServerSocketChannelImpl.accept0(Native > > > >>> > > > > > Method) > > > >>> > > > > > > > > >>> > > > > at > > > >>> > > > > > sun.nio.ch.ServerSocketChannelImpl.accept( > > > >>> > > > ServerSocketChannelImpl.java: > > > >>> > > > > 422) > > > >>> > > > > > at > > > >>> > > > > > sun.nio.ch.ServerSocketChannelImpl.accept( > > > >>> > > > ServerSocketChannelImpl.java: > > > >>> > > > > 250) > > > >>> > > > > > at kafka.network.Acceptor.accept( > > > >>> SocketServer.scala:323) > > > >>> > > > > > at kafka.network.Acceptor.run( > > > SocketServer.scala:268) > > > >>> > > > > > at java.lang.Thread.run(Thread.java:745) > > > >>> > > > > > [2017-01-08 01:19:02,877] ERROR Error while accepting > > > >>> connection > > > >>> > > > > > (kafka.network.Acceptor) > > > >>> > > > > > java.io.IOException: Too many open files > > > >>> > > > > > at sun.nio.ch.ServerSocketChannelImpl. > > accept0(Native > > > >>> > Method) > > > >>> > > > > > at > > > >>> > > > > > sun.nio.ch.ServerSocketChannelImpl.accept( > > > >>> > > > ServerSocketChannelImpl.java: > > > >>> > > > > 422) > > > >>> > > > > > at > > > >>> > > > > > sun.nio.ch.ServerSocketChannelImpl.accept( > > > >>> > > > ServerSocketChannelImpl.java: > > > >>> > > > > 250) > > > >>> > > > > > at kafka.network.Acceptor.accept( > > > >>> SocketServer.scala:323) > > > >>> > > > > > at kafka.network.Acceptor.run( > > > SocketServer.scala:268) > > > >>> > > > > > at java.lang.Thread.run(Thread.java:745) > > > >>> > > > > > ..... > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > And then kafka crashes. > > > >>> > > > > > > > >>> > > > > Has anyone seen this behavior of slow log segmented > > > >>> > > > > being > > > rolled? > > > >>> > > > > Any ideas of how to track down what could be causing this? > > > >>> > > > > > > > >>> > > > > Thanks! > > > >>> > > > > Stephen > > > >>> > > > > > > > >>> > > > > > > >>> > > This e-mail and any files transmitted with it are > > > >>> > > confidential, > > may > > > >>> > > contain sensitive information, and are intended solely for > > > >>> > > the > > use > > > >>> of the > > > >>> > > individual or entity to whom they are addressed. If you have > > > received > > > >>> > this > > > >>> > > e-mail in error, please notify the sender by reply e-mail > > > >>> immediately and > > > >>> > > destroy all copies of the e-mail and any attachments. > > > >>> > > > > > >>> > > > > >>> > > > >> > > > >> > > > > > > > > > > This e-mail and any files transmitted with it are confidential, may > contain sensitive information, and are intended solely for the use of the > individual or entity to whom they are addressed. If you have received this > e-mail in error, please notify the sender by reply e-mail immediately and > destroy all copies of the e-mail and any attachments. >