Not sure why re-registering in broker fails. Normall, when the broker registers, the ZK path should already be gone.
Thanks, Jun On Thu, Mar 28, 2013 at 8:31 AM, Yonghui Zhao <zhaoyong...@gmail.com> wrote: > Will do a check, I just wonder why broker need re-regiester and it failed, > so broker service is stopped. > > 2013/3/28 Jun Rao <jun...@gmail.com> > > > Do you see lots of ZK session expiration in the broker too? If so, that > > suggests a GC issue in the broker too. So, you may need to tune the GC in > > the broker as well. > > > > Thanks, > > > > Jun > > > > On Thu, Mar 28, 2013 at 8:20 AM, Yonghui Zhao <zhaoyong...@gmail.com> > > wrote: > > > > > Thanks Jun. > > > > > > But I can't understand how consumer GC trigger kafka server issue: > > > java.lang.RuntimeException: A broker is already registered on the path > > > /brokers/ids/0. This probably indicates that you either have > configured a > > > brokerid that is already in use, or else you have shutdown this broker > > and > > > restarted it faster than the zookeeper timeout so it appears to be > > > re-registering. > > > > > > > > > 2013/3/28 Jun Rao <jun...@gmail.com> > > > > > > > The zk session timeout only kicks in if you force kill the consumer. > > > > Otherwise, consumer will close ZK session properly on clean shutdown. > > > > > > > > The problem with GC is that if the consumer pauses for a long time, > ZK > > > > server won't receive pings from the client and thus can expire a > still > > > > existing session. > > > > > > > > The best thing to do here is to fix the GC issue since it may have > > other > > > > implications. To start with, you probably want to enable GC logging > and > > > see > > > > how long and how frequent your GCs are. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Thu, Mar 28, 2013 at 12:23 AM, Yonghui Zhao < > zhaoyong...@gmail.com > > > > >wrote: > > > > > > > > > I used zookeeper-3.3.4 in kafka. > > > > > > > > > > Default tickTime is 3 seconds, minSesstionTimeOut is 6 seconds. > > > > > Now I change tickTime to 5 seconds. minSesstionTimeOut to 10 > seconds > > > > > But if we change timeout to a larger one, > > > > > "you have shutdown this broker and restarted it faster than the > > > zookeeper > > > > > timeout so it appears to be re-registering." > > > > > this could happened more easily > > > > > > > > > > Do you think consumer GC will affect kafka server and zk > connection? > > > > > > > > > > > > > > > > > > > > 2013/3/28 Jun Rao <jun...@gmail.com> > > > > > > > > > > > Not sure why the re-registration fails. Are you using ZK 3.3.4 or > > > > above? > > > > > > > > > > > > It seems that you consumer still GCs, which is the root cause. > So, > > > you > > > > > will > > > > > > need to tune the GC setting further. Another way to avoid ZK > > session > > > > > > timeout is to increase the session timeout config. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Wed, Mar 27, 2013 at 8:35 PM, Yonghui Zhao < > > zhaoyong...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > Now I used GC like this: > > > > > > > > > > > > > > -server -Xms1536m -Xmx1536m -XX:NewSize=128m > -XX:MaxNewSize=128m > > > > > > > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > > > > > > > -XX:CMSInitiatingOccupancyFraction=70 > > > > > > > > > > > > > > > > > > > > > But it still happened. It seems kafka server reconnect with > zk, > > > but > > > > > the > > > > > > > old node was still there. So kafka server stopped. > > > > > > > Can kafka server retry to connect with zk? > > > > > > > > > > > > > > > > > > > > > 2013-03-27 22:15:03,529] INFO Opening socket connection to > server > > > > > > > localhost/ > > > > > > > 127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn) > > > > > > > [2013-03-27 22:15:03,529] INFO Socket connection established to > > > > > > localhost/ > > > > > > > 127.0.0.1:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > > > > > [2013-03-27 22:15:05,855] INFO Session establishment complete > on > > > > server > > > > > > > localhost/127.0.0.1:2181, sessionid = 0x13da6d94abf00aa, > > > negotiated > > > > > > > timeout > > > > > > > = 6000 (org.apache.zookeeper.ClientCnxn) > > > > > > > [2013-03-27 22:15:05,942] INFO zookeeper state changed > > > > (SyncConnected) > > > > > > > (org.I0Itec.zkclient.ZkClient) > > > > > > > [2013-03-27 22:15:14,912] INFO conflict in /brokers/ids/0 data: > > > > > > > 127.0.0.1-1364393691770:127.0.0.1:9093 stored data: null > > > > > > > (kafka.utils.ZkUtils$) > > > > > > > [2013-03-27 22:15:14,942] ERROR Error handling event > ZkEvent[New > > > > > session > > > > > > > event sent to > > > > > kafka.server.KafkaZooKeeper$SessionExpireListener@18f389bc > > > > > > ] > > > > > > > (org.I0Itec.zkclient.ZkEventThread) > > > > > > > java.lang.RuntimeException: A broker is already registered on > the > > > > path > > > > > > > /brokers/ids/0. This probably indicates that you either have > > > > > configured a > > > > > > > brokerid that is already in use, or else you have shutdown this > > > > broker > > > > > > and > > > > > > > restarted it faster than the zookeeper timeout so it appears to > > be > > > > > > > re-registering. > > > > > > > at > > > > > > > > > > > > kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:57) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.KafkaZooKeeper$SessionExpireListener.handleNewSession(KafkaZooKeeper.scala:100) > > > > > > > at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472) > > > > > > > at > > org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > > > > > > [2013-03-27 22:15:33,736] INFO Closing socket connection to / > > > > 127.0.0.1 > > > > > . > > > > > > > (kafka.network.Processor) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2013/3/27 Neha Narkhede <neha.narkh...@gmail.com> > > > > > > > > > > > > > > > The kafka-server-start.sh script doesn't have the mentioned > GC > > > > > > > > settings and heap size configured. However, probably doing > that > > > is > > > > a > > > > > > > > good idea. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Neha > > > > > > > > > > > > > > > > On Tue, Mar 26, 2013 at 9:47 AM, Yonghui Zhao < > > > > zhaoyong...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > kafka server is started by bin/kafka-server-start.sh. No > gc > > > > > setting. > > > > > > > > > 在 2013-3-26 下午11:40,"Neha Narkhede" < > neha.narkh...@gmail.com > > > >写道: > > > > > > > > > > > > > > > > > >> Did you have a gc pause around that time on the server ? > > What > > > > are > > > > > > your > > > > > > > > >> server's current gc settings ? > > > > > > > > >> > > > > > > > > >> Thanks, > > > > > > > > >> Neha > > > > > > > > >> > > > > > > > > >> On Mon, Mar 25, 2013 at 8:48 PM, Yonghui Zhao < > > > > > > zhaoyong...@gmail.com> > > > > > > > > >> wrote: > > > > > > > > >> > Thanks Neha, btw have you seen this exception. We > didn't > > > > > restart > > > > > > > any > > > > > > > > >> > service it happens in deep night. > > > > > > > > >> > > > > > > > > > >> > java.lang.RuntimeException: A broker is already > registered > > > on > > > > > the > > > > > > > path > > > > > > > > >> > /brokers/ids/0. This probably indicates that you either > > have > > > > > > > > configured a > > > > > > > > >> > brokerid that is already in use, or else you have > shutdown > > > > this > > > > > > > broker > > > > > > > > >> and > > > > > > > > >> > restarted it faster than the zookeeper timeout so it > > appears > > > > to > > > > > be > > > > > > > > >> > re-registering. > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > > > > > > > > kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:57) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.KafkaZooKeeper$SessionExpireListener.handleNewSession(KafkaZooKeeper.scala:100) > > > > > > > > >> > at > > org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472) > > > > > > > > >> > at > > > > > > org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > > > > > > > >> > [2013-03-26 02:07:19,155] INFO re-registering broker > info > > in > > > > ZK > > > > > > for > > > > > > > > >> broker > > > > > > > > >> > 0 (kafka.server.KafkaZooKeeper) > > > > > > > > >> > [2013-03-26 02:07:19,155] INFO Registering broker > > > > /brokers/ids/0 > > > > > > > > >> > (kafka.server.KafkaZooKeeper) > > > > > > > > >> > [2013-03-26 02:07:19,611] INFO conflict in > /brokers/ids/0 > > > > data: > > > > > > > > >> > 127.0.0.1-1364234839275:127.0.0.1:9093 stored data: > > > > > > > > >> 127.0.0.1-1364227372971: > > > > > > > > >> > 127.0.0.1:9093 (kafka.utils.ZkUtils$) > > > > > > > > >> > [2013-03-26 02:07:19,611] ERROR Error handling event > > > > ZkEvent[New > > > > > > > > session > > > > > > > > >> > event sent to > > > > > > > > kafka.server.KafkaZooKeeper$SessionExpireListener@40f8c9bf > > > > > > > > >> ] > > > > > > > > >> > (org.I0Itec.zkclient.ZkEventThread) > > > > > > > > >> > java.lang.RuntimeException: A broker is already > registered > > > on > > > > > the > > > > > > > path > > > > > > > > >> > /brokers/ids/0. This probably indicates that you either > > have > > > > > > > > configured a > > > > > > > > >> > brokerid that is already in use, or else you have > shutdown > > > > this > > > > > > > broker > > > > > > > > >> and > > > > > > > > >> > restarted it faster than the zookeeper timeout so it > > appears > > > > to > > > > > be > > > > > > > > >> > re-registering. > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > > > > > > > > kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:57) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.KafkaZooKeeper$SessionExpireListener.handleNewSession(KafkaZooKeeper.scala:100) > > > > > > > > >> > at > > org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472) > > > > > > > > >> > at > > > > > > org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > 2013/3/26 Neha Narkhede <neha.narkh...@gmail.com> > > > > > > > > >> > > > > > > > > > >> >> That really depends on your consumer application's > memory > > > > > > > allocation > > > > > > > > >> >> patterns. If it is a thin wrapper over a Kafka > consumer, > > I > > > > > would > > > > > > > > imagine > > > > > > > > >> >> you can get away with using CMS for the tenured > > generation > > > > and > > > > > > > > parallel > > > > > > > > >> >> collector for the new generation with a small heap like > > 1gb > > > > or > > > > > > so. > > > > > > > > >> >> > > > > > > > > >> >> Thanks, > > > > > > > > >> >> Neha > > > > > > > > >> >> > > > > > > > > >> >> On Monday, March 25, 2013, Yonghui Zhao wrote: > > > > > > > > >> >> > > > > > > > > >> >> > Any suggestion on consumer side? > > > > > > > > >> >> > 在 2013-3-25 下午9:49,"Neha Narkhede" < > > > > neha.narkh...@gmail.com > > > > > > > > >> <javascript:;> > > > > > > > > >> >> > >写道: > > > > > > > > >> >> > > > > > > > > > >> >> > > For Kafka 0.7 in production at Linkedin, we use a > > heap > > > of > > > > > > size > > > > > > > > 3G, > > > > > > > > >> new > > > > > > > > >> >> > gen > > > > > > > > >> >> > > 256 MB, CMS collector with occupancy of 70%. > > > > > > > > >> >> > > > > > > > > > > >> >> > > Thanks, > > > > > > > > >> >> > > Neha > > > > > > > > >> >> > > > > > > > > > > >> >> > > On Sunday, March 24, 2013, Yonghui Zhao wrote: > > > > > > > > >> >> > > > > > > > > > > >> >> > > > Hi Jun, > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > I used kafka-server-start.sh to start kafka, > there > > is > > > > > only > > > > > > > one > > > > > > > > jvm > > > > > > > > >> >> > > setting > > > > > > > > >> >> > > > "-Xmx512M“ > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > Do you have some recommend GC setting? Usually > > our > > > > > sever > > > > > > > has > > > > > > > > >> 32GB > > > > > > > > >> >> or > > > > > > > > >> >> > > 64GB > > > > > > > > >> >> > > > RAM. > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > 2013/3/22 Jun Rao <jun...@gmail.com> > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > A typical reason for many rebalancing is the > > > consumer > > > > > > side > > > > > > > > GC. > > > > > > > > >> If > > > > > > > > >> >> so, > > > > > > > > >> >> > > you > > > > > > > > >> >> > > > > will see logs in the consume saying sth like > > > "expired > > > > > > > > session" > > > > > > > > >> for > > > > > > > > >> >> > ZK. > > > > > > > > >> >> > > > > Occasional rebalances are fine. Too many > > rebalances > > > > can > > > > > > > slow > > > > > > > > >> down > > > > > > > > >> >> the > > > > > > > > >> >> > > > > consumption and you will need to tune your GC > > > > setting. > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > Thanks, > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > Jun > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > On Thu, Mar 21, 2013 at 11:07 PM, Yonghui Zhao > < > > > > > > > > >> >> > zhaoyong...@gmail.com > > > > > > > > >> >> > > > > >wrote: > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > Yes, before consumer exception: > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > 2013/03/21 12:07:17.909 INFO > > > > > > [ZookeeperConsumerConnector] > > > > > > > > [] > > > > > > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *end > > > > > rebalancing > > > > > > > > >> >> > > > > > > consumer*0_lg-mc-db01.bj-1363784482043-f98c7868 > > > try > > > > > #0 > > > > > > > > >> >> > > > > > 2013/03/21 12:07:17.911 INFO > > > > > > [ZookeeperConsumerConnector] > > > > > > > > [] > > > > > > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *begin > > > > > > rebalancing > > > > > > > > >> >> > > > > > > consumer*0_lg-mc-db01.bj-1363784482043-f98c7868 > > > try > > > > > #0 > > > > > > > > >> >> > > > > > 2013/03/21 12:07:17.934 INFO > [FetcherRunnable] > > [] > > > > > > > > >> FetchRunnable-0 > > > > > > > > >> >> > > start > > > > > > > > >> >> > > > > > fetching topic: sms part: 0 offset: > 43667888259 > > > > from > > > > > > > > >> >> > 127.0.0.1:9093 > > > > > > > > >> >> > > > > > 2013/03/21 12:07:17.940 INFO [SimpleConsumer] > > [] > > > > > > > Reconnect > > > > > > > > in > > > > > > > > >> >> > > > multifetch > > > > > > > > >> >> > > > > > due to socket error: > > > > > > > > >> >> > > > > > > java.nio.channels.*ClosedByInterruptException* > > > > > > > > >> >> > > > > > at > > > > > > > > >> java.nio.channels.spi.*AbstractInterruptibleChannel* > > > > > > > > >> >> > > > > > .end(AbstractInterruptibleChannel.java:201) > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > 2013/03/21 12:07:17.978 INFO > > > > > > [ZookeeperConsumerConnector] > > > > > > > > [] > > > > > > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *end > > > > > rebalancing > > > > > > > > >> >> > > > > > > consumer*0_lg-mc-db01.bj-1363784482043-f98c7868 > > > try > > > > > #0 > > > > > > > > >> >> > > > > > 2013/03/21 12:07:18.004 INFO > [FetcherRunnable] > > [] > > > > > > > > >> FetchRunnable-0 > > > > > > > > >> >> > > start > > > > > > > > >> >> > > > > > fetching topic: sms part: 0 offset: > 43667888259 > > > > from > > > > > > > > >> >> > 127.0.0.1:9093 > > > > > > > > >> >> > > > > > 2013/03/21 12:07:18.066 INFO > > > > > > [ZookeeperConsumerConnector] > > > > > > > > [] > > > > > > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *begin > > > > > > rebalancing > > > > > > > > >> >> consume*r > > > > > > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 try #0 > > > > > > > > >> >> > > > > > 2013/03/21 12:07:18.176 INFO [SimpleConsumer] > > [] > > > > > > > Reconnect > > > > > > > > in > > > > > > > > >> >> > > > multifetch > > > > > > > > >> >> > > > > > due to socket error: > > > > > > > > >> >> > > > > > > java.nio.channels.*ClosedByInterruptException* > > > > > > > > >> >> > > > > > at > > > > > > > > >> java.nio.channels.spi.*AbstractInterruptibleChannel* > > > > > > > > >> >> > > > > > .end(AbstractInterruptibleChannel.java:201) > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > So you think it is normal? How can we avoid > > this > > > > > > > exception? > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > I used 4 partitions in kafka, use only 1 > > > > partition? > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > 2013/3/22 Jun Rao <jun...@gmail.com> > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > Do you see any rebalances in the consumer? > > Each > > > > > > > rebalance > > > > > > > > >> will > > > > > > > > >> >> > > > > interrupt > > > > > > > > >> >> > > > > > > existing fetcher threads first. > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > Thanks, > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > Jun > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > On Thu, Mar 21, 2013 at 9:40 PM, Yonghui > > Zhao < > > > > > > > > >> >> > > zhaoyong...@gmail.com > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >